266 PART 5 Looking for Relationships with Correlation and Regression
Heads Up: Knowing What Can Go Wrong
with Logistic Regression
Logistic regression presents many of the same potential pitfalls as ordinary least-
squares regression (see Chapters 16 and 17), as well as several that are specific to
logistic regression. Watch out for some of the more common pitfalls:»
» Don’t fit a logistic function to non-logistic data: Don’t use logistic regres-
sion to fit data that doesn’t behave like the logistic S curve. Plot your grouped
data (as shown earlier in Figure 18-1b), and if it’s clear that the fraction of
positive outcomes isn’t leveling off at Y
0 or Y
1 for very large or very
small X values, then logistic regression is not the correct modeling approach.
The H-L test described earlier under the section “Assessing the adequacy of
the model” provides a statistical test to determine if your data qualify for
logistic regression. Also, in Chapter 19, we describe a more generalized logistic
model that contains other parameters for the upper and lower leveling-
off values.»
» Watch out for collinearity and disappearing significance: When you are
doing any kind of regression and two or more predictor variables are strongly
related with each other, you can be plagued with problems of collinearity. We
describe this problem in Chapter 17, and potential modeling solutions in
Chapter 20.»
» Check for inadvertent reverse-coding of the outcome variable: The
outcome variable should always be coded as 1 for a yes outcome and 0 for a
no outcome (refer to Table 18-1 for an example). If the variable in the data set
is coded using characters, you should recode an outcome variable using the
0/1 coding. It is important you do the coding yourself, and do not leave it to an
automated function in the program, because it may inadvertently reverse the
coding so that 1 = no and 0 = yes. This error of reversal won’t affect any p
values, but it will cause all your ORs and their CIs to be the reciprocals of what
they would have been, meaning they will refer to the odds of no rather than
the odds of yes.»
» Don’t misinterpret odds ratios for categorical predicators: Categorical
predictors should be coded numerically as we describe in Chapter 8. It is
important to ensure that proper indicator variable coding is used, and these
variables are introduced properly in the model, as described in Chapter 17.
Also, be careful not to misinterpret odds ratios for numerical predictors, and be
mindful of the complete separation problem, as described in the following
sections.